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Amendments to the Claims 

This listing of claims will replace all prior 
versions, and listings, of claims in the application: 
Listing of Claims: 

1. (Currently amended) A method for speech synthesis, 
comprising : 

providing a segment inventory comprising, for a plurality 
of speech segments, respective sequences of feature vectors, 
by estimating spectral envelopes of input speech signals 
corresponding to the speech segments in a succession of time 
intervals during each of the speech segments, and integrating 
the spectral envelopes over a plurality of window functions in 
a frequency domain so as to determine vector elements of the 
feature vectors; 

receiving phonetic and prosodic information indicative of 
an output speech signal to be generated; 

selecting the sequences of feature vectors from the 
inventory responsive to the phonetic and prosodic information; 

processing the selected sequences of feature vectors so 
as to generate a concatenated output series of feature vectors 
in a frequency domain ; 

computing a series of complex line spectra of the output 
signal from the series of the feature vectors; and 
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transforming the complex line spectra to a time domain 
speech signal for output . 

2. (Original) A method according to • claim 1, wherein 
providing the segment inventory comprises providing segment 
information comprising respective phonetic identifiers of the 
segments, and wherein selecting the sequences of feature 
vectors comprises finding the segments whose phonetic 
identifiers are close to the received phonetic information. 

3. (Original) A method according to claim 2, wherein the 
segments comprise lef ernes, and wherein the phonetic 
identifiers comprise lef erne labels. 

4. (Original) A method according to claim 2, wherein the 
segment information further comprises one or more prosodic 
parameters with respect to each of the segments, and wherein 
selecting the sequences of feature vectors comprises finding 
the segments whose one or more prosodic parameters are close 
to the received prosodic information. 

5. (Original) A method according to claim 4, wherein the one 
or more prosodic parameters are selected from a group of 
parameters consisting of a duration, an energy level and a 
pitch of each of the segments. 
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6. (Original) A method according to claim 1, wherein the 
feature vectors comprise auxiliary vector elements indicative 
of further features of the speech segments, in addition to the 
elements determined by integrating the spectral envelopes of 
the input speech signals. 

7. (Original) A method according to claim 6, wherein the 
auxiliary vector elements comprise voicing vector elements 
indicative of a degree of voicing of frames of the 
corresponding speech segments, and wherein computing the 
complex line spectra comprises reconstructing the output 
speech signal with the degree of voicing indicated by the 
voicing vector elements. 

8. (Original) A method according to claim 7, wherein 
receiving the prosodic information comprises receiving pitch 
values, and wherein reconstructing the output speech signal 
comprises adjusting a frequency spectrum of the output speech 
signal responsive to the pitch values . 

9. (Original) A method according to claim 1, wherein 
selecting the sequences of feature vectors comprises: 

selecting candidate segments from the inventory; 

computing a cost function for each of the candidate 
segments responsive to the phonetic and prosodic information 
and to the feature vectors of the candidate segments; and 
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selecting the segments so as to minimize the cost 
function . 

10. (Original) A method according to claim 1, wherein 
concatenating the selected sequences of feature vectors 
comprises adjusting the feature vectors responsive to the 
prosodic information . 

11. (Original) A method according to claim 10, wherein the 
prosodic information comprises respective durations of the 
segments to be incorporated in the output speech signal, and 
wherein adjusting the feature vectors comprises removing one 
or more of the feature vectors from the selected sequences so 
as to shorten the durations of one or more of the segments. 

12. A method according to claim 10, wherein the prosodic 
information comprises respective durations of the segments to 
be incorporated in the output speech signal, and wherein 
adjusting the feature vectors comprises adding one or more 
further feature vectors to the selected sequences so as to 
lengthen the durations of one or more of the segments. 

13. (Original) A method according to claim 10, wherein the 
prosodic information comprises respective energy levels of the 
segments to be incorporated in the output speech signal, and 
wherein adjusting the feature vectors comprises altering one 
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or more of the vector elements so as to adjust the energy- 
levels of one or more of the segments . 

14. (Original) A method according to claim 1, wherein 
processing the selected sequences comprises adjusting the 
vector elements so as to provide a smooth transition between 
the segments in the time domain signal. 

15. (Original) A method according to claim 1, wherein the 
vector elements comprise Mel Frequency Cepstral Coefficients 
of the speech segments, determined based on the integrated 
spectral envelopes . 

16. (Currently amended) A method for speech synthesis, 
comprising : 

receiving an input speech signal containing a set of 
speech segments; 

estimating spectral envelopes of the input speech signal 
in a succession of time intervals during each of the speech 
segments ; 

integrating the spectral envelopes over a plurality of 
window functions in a frequency domain so as to determine 
elements of feature vectors corresponding to the speech 
segments; and 

reconstructing an output speech signal by concatenating 
the feature vectors corresponding to a sequence of the speech 
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segments to form a series in a frequency domain, computing a 
series of complex line spectra of the output signal from the 
series of feature vectors, and transforming the complex line 
spectra to a time domain signal . 

17. (Original) A method according to claim 16, wherein 
receiving the input speech signal comprises dividing the input 
speech signal into the segments and determining segment 
information comprising respective phonetic identifiers of the 
segments, and wherein reconstructing the output speech signal 
comprises selecting the segments whose feature vectors are to 
be concatenated responsive to the segment information 
determined with respect to the segments. 

18. (Original) A method according to claim 17, wherein 
dividing the input speech signal into the segments comprises 
dividing the signal into lef ernes, and wherein the phonetic 
identifiers comprise lef erne labels. 

19. (Original) A method according to claim 17, wherein 
determining the segment information further comprises finding 
respective segment parameters including one or more of a 
duration, an energy level and a pitch of each of the segments, 
responsive to which parameters the segments are selected for 
use in reconstructing the output speech signal. 
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20. (Original) A method according to claim 19, wherein 
reconstructing the output speech signal comprises modifying 
the feature vectors of the selected segments so as to adjust 
the segment parameters of the segments in the output speech 
signal . 

21. (Original) A method according to claim 16, and comprising 
determining respective degrees of voicing of the speech 
segments, and incorporating the degrees of voicing as elements 
of the feature vectors for use in reconstructing the output 
speech signal . 

22. (Canceled) 

23. (Original) A method according to claim 16, wherein the 
window functions are non-zero only within different, 
respective spectral windows and have variable values over 
their respective windows, and wherein integrating the spectral 
envelopes comprises calculating products of the spectral 
envelopes with the window functions, and calculating integrals 
of the products over the respective windows of the window 
functions. 

24. (Original) A method according claim 23, and comprising 
applying a mathematical transformation to the integrals in 
order to determine the elements of the feature vectors. 
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25. (Original) A method according to claim 24, wherein the 
frequency domain comprises a Mel frequency domain, and wherein 
applying the mathematical transformation comprises applying 
log and discrete cosine transform operations in order to 
determine Mel Frequency Cepstral Coefficients to be used as 
the elements of the feature vectors. 

26. (Currently amended) A device for speech synthesis, 
comprising : 

a memory, arranged to hold a segment inventory 
comprising, for a plurality of speech segments, respective 
sequences of feature vectors having vector elements determined 
by estimating spectral envelopes of input speech signals 
corresponding to the speech segments in a succession of time 
intervals during each of the speech segments, and integrating 
the spectral envelopes over a plurality of window functions in 
a frequency domain; and 

a speech processor, arranged to receive phonetic and 
prosodic information indicative of an output speech signal to 
be generated, to select the sequences of feature vectors from 
the inventory responsive to the phonetic and prosodic 
information, to process the selected sequences of feature 
vectors so as to generate a concatenated output series of 
feature vectors in a frequency domain , and to compute a series 
of complex line spectra of the output signal from the series 
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of the feature vectors and transform the complex line spectra 
to a time domain speech signal for output. 

27. (Original) A device according to claim 26, wherein the 
segment inventory comprises segment information comprising 
respective phonetic identifiers of the segments, and wherein 
the processor is arranged to select the sequences of feature 
vectors by finding the segments in the inventory whose 
phonetic identifiers are close to the received phonetic 
information . 

28. (Original) A device according to claim 27, wherein the 
segments comprise lef ernes, and wherein the phonetic 
identifiers comprise lef erne labels. 

29. (Original) A device according to claim 27, wherein the 
segment information further comprises one or more prosodic 
parameters with respect to each of the segments, and wherein 
the processor is arranged to select the sequences of feature 
vectors by finding the segments whose one or more prosodic 
parameters are close to the received prosodic information. 

30. (Original) A device according to claim 29, wherein the 
one or more prosodic parameters are selected from a group of 
parameters consisting of a duration, an energy level and a 
pitch of each of the segments. 
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31. (Original) A device according to claim 26, wherein the 
feature vectors comprise auxiliary vector elements indicative 
of further features of the speech segments, in addition to the 
elements determined by integrating the spectral envelopes of 
the input speech signals. 

32. (Original) A device according to claim 31, wherein the 
auxiliary vector elements comprise voicing vector elements 
indicative of a degree of voicing of frames of the 
corresponding speech segments, and wherein the processor is 
arranged to reconstruct the output speech signal with the 
degree of voicing indicated by the voicing vector elements. 

33. (Original) A device according to claim 32, wherein the 
prosodic information comprises pitch values, and wherein the 
processor is arranged to adjust a frequency spectrum of the 
output speech signal responsive to the pitch values. 

34. (Original) A device according to claim 26, wherein the 
processor is arranged to select the sequences of feature 
vectors by selecting candidate segments from the inventory, 
computing a cost function for each of the candidate segments 
responsive to the phonetic and prosodic information and to the 
feature vectors of the candidate segments, and selecting the 
segments so as to minimize the cost function. 
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35. (Original) A device according to claim 26, wherein the 
processor is arranged to adjust the feature vectors in the 
combined output series responsive to the prosodic information. 

36. (Original) A device according to claim 35, wherein the 
prosodic information comprises respective durations of the 
segments to be incorporated in the output speech signal, and 
wherein the processor is arranged to adjust the feature 
vectors by removing one or more of the feature vectors from 
the selected sequences so as to shorten the durations of one 
or more of the segments. 

37. (Original) A device according to claim 35, wherein the 
prosodic information comprises respective durations of the 
segments to be incorporated in the output speech signal, and 
wherein the processor is arranged to adjust the feature 
vectors by adding one or more further feature vectors to the 
selected sequences so as to lengthen the durations of one or 
more of the segments. 

38. (Original) A device according to claim 35, wherein the 
prosodic information comprises respective energy levels of the 
segments to be incorporated in the output speech signal, and 
wherein the processor is arranged to adjust the energy levels 
of one or more of the segments by altering one or more of the 
vector elements. 
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39. (Original) A device according to claim 26, wherein the 
processor is arranged to adjust the vector elements so as to 
provide a smooth transition between the segments in the time 
domain signal. 

40. (Original) A device according to claim 26, wherein the 
vector elements comprise Mel Frequency Cepstral Coefficients 
of the speech segments, determined based on the integrated 
spectral envelopes . 

41. (Currently amended) A device for speech synthesis, 
comprising : 

a memory, arranged to hold a segment inventory determined 
by processing an input speech signal containing a set of 
speech segments so as to estimate spectral envelopes of the 
input speech signal in a succession of time intervals during 
each of the speech segments, and integrating the spectral 
envelopes over a plurality of window functions in a frequency 
domain so as to determine elements of feature vectors 
corresponding to the speech segments; and 

a speech processor, arranged to reconstruct an output 
speech signal by concatenating the feature vectors 
corresponding to a sequence of the speech segments to form a 
series in a frequency domain, computing a series of complex 
line spectra of the output signal from the series of feature 
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vectors, and transforming the complex line spectra to a time 
domain signal . 

42. (Original) A device according to claim 41, wherein the 
input speech signal is processed by dividing the input speech 
signal into the segments and determining segment information 
comprising respective phonetic identifiers of the segments, 
and wherein the processor is arranged to reconstruct the 
output speech signal by selecting the segments whose feature 
vectors are to be concatenated responsive to the segment 
information determined with respect to the segments. 

43. (Original) A device according to claim 42, wherein the 
input speech signal is divided into lef ernes, and the phonetic 
identifiers comprise lef erne labels. 

44. (Original) A device according to claim 42, wherein the 
segment information further comprises respective segment 
parameters including one or more of a duration, an energy 
level and a pitch of each of the segments, responsive to which 
parameters the segments are selected by the processor for use 
in reconstructing the output speech signal. 

45. (Original) A device according to claim 44, wherein the 
processor is arranged to modify the feature vectors of the 
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selected segments so as to adjust the segment parameters of 
the segments in the output speech signal. 

46. (Original) A device according to claim 41, wherein the 
feature vectors comprise respective degrees of voicing of the 
speech segments, for use by the processor in reconstructing 
the output speech signal . 

47 . (Canceled) 

48. (Previously presented) A device according to claim 41, 
wherein the window functions are non-zero only within 
different, respective spectral windows and have variable 
values over their respective windows, and wherein the feature 
vector elements are determined by calculating products of the 
spectral envelopes with the window functions, and calculating 
integrals of the products over the respective windows of the 
window functions. 

49. (Original) A device according claim 48, wherein a 
mathematical transformation is applied to the integrals in 
order to determine the elements of the feature vectors. 

50. (Original) A device according to claim 48, wherein the 
frequency domain comprises a Mel frequency domain, and wherein 
the mathematical transformation comprises log and discrete 
cosine transform operations, which . are applied so as to 
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determine Mel Frequency Cepstral Coefficients to be used as 
the elements of the feature vectors. 

51. (Currently amended) A computer software product, 
comprising a computer- readable medium in which program 
instructions are stored, which instructions, when read by a 
computer, cause the computer to access a segment inventory 
comprising, for a plurality of speech segments, respective 
sequences of feature vectors having vector elements determined 
by estimating spectral envelopes of input speech signals 
corresponding to the speech segments in a succession of time 
intervals during each of the speech segments, and integrating 
the spectral envelopes over a plurality of window functions in 
a frequency domain, and in response to phonetic and prosodic 
information indicative of an output speech signal to be 
generated, cause the computer to select the sequences of 
feature vectors from the inventory responsive to the phonetic 
and prosodic information, to process the selected sequences of 
feature vectors so as to generate a concatenated output series 
of feature vectors in a frequency domain , and to compute a 
series of complex line spectra of the output signal from the 
series of the feature vectors and transform the complex line 
spectra to a time domain speech signal for output. 
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52. (Original) A product according to claim 51, wherein the 
segment inventory comprises segment information comprising 
respective phonetic identifiers of the segments, and wherein 
the instructions cause the computer to select the sequences of 
feature vectors by finding the segments in the inventory whose 
phonetic identifiers are close to the received phonetic 
information . 

53. (Original) A product according to claim 52, wherein the 
segments comprise lef ernes, and wherein the phonetic 
identifiers comprise lef erne labels. 

54. (Original) A product according to claim 52, wherein the 
segment information further comprises one or more prosodic 
parameters with respect to each of the segments, and wherein 
the instructions cause the computer to select the sequences of 
feature vectors by finding the segments whose one or more 
prosodic parameters are close to the received prosodic 
information . 

55. (Original) A product according to claim 54 , wherein the 
one or more prosodic parameters are selected from a group of 
parameters consisting of a duration, an energy level and a 
pitch of each of the segments. 
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56. (Original) A product according to claim 54, wherein the 
feature vectors comprise auxiliary vector elements indicative 
of further features of the speech segments, in addition to the 
elements determined by integrating the spectral envelopes of 
the input speech signals. 

57. (Original) A product according to claim 56, wherein the 
auxiliary vector elements comprise voicing vector elements 
indicative of a degree of voicing of frames of the 
corresponding speech segments, and wherein the instructions 
cause the computer to reconstruct the output speech signal 
with the degree of voicing indicated by the voicing vector 
elements . 

58. (Original) A product according to claim 57, wherein the 
prosodic information comprises pitch values, and wherein the 
instructions cause the computer to adjust a frequency spectrum 
of the output speech signal responsive to the pitch values. 

59. (Original) A product according to claim 51, wherein the 
instructions cause the computer to select the sequences of 
feature vectors by selecting candidate segments from the 
inventory, computing a cost function for each of the candidate 
segments responsive to the phonetic and prosodic information 
and to the feature vectors of the candidate segments, and 
selecting the segments so as to minimize the cost function. 
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60. (Original) A product according to claim 51, wherein the 
instructions cause the computer to adjust the feature vectors 
in the combined output series responsive to the prosodic 
information. 

61. (Original) A product according to claim 60, wherein the 
prosodic information comprises respective durations of the 
segments to be incorporated in the output speech signal, and 
wherein the instructions cause the computer to adjust the 
feature vectors by removing one or more of the feature vectors 
from the selected sequences so as to shorten the durations of 
one or more of the segments. 

62. (Original) A product according to claim 60, wherein the 
prosodic information comprises respective durations of the 
segments to be incorporated in the output speech signal, and 
wherein the instructions cause the computer to adjust the 
feature vectors by adding one or more further feature vectors 
to the selected sequences so as to lengthen the durations of 
one or more of the segments. 

63. (Original) A product according to claim 60, wherein the 
prosodic information comprises respective energy levels of the 
segments to be incorporated in the output speech signal, and 
wherein the instructions cause the computer to adjust the 



- 19 - 



Appln. No. 09/901,031 

Amd dated September 6", 2005 

Reply to Office Action of June 8, 2005 

energy levels of one or more of the segments by altering one 
or more of the vector elements. 

64. (Original) A product according to claim 51, wherein the 
instructions cause the computer to adjust the vector elements 
so as to provide a smooth transition between the segments in 
the time domain signal. 

65. (Original) A product according to claim 51 , wherein the 
vector elements comprise Mel Frequency Cepstral Coefficients 
of the speech segments, determined based on the integrated 
spectral envelopes. 

66. (Currently amended) A computer software product, 
comprising a computer-readable medium in which a segment 
inventory is stored, the inventory having been determined by 
processing an input speech signal containing a set of speech 
segments so as to estimate spectral envelopes of the input 
speech signal in a succession of time intervals during each of 
the speech segments, and integrating the spectral envelopes 
over a plurality of window functions in a frequency domain so 
as to determine elements of feature vectors corresponding to 
the speech segments , so that a speech processor can 
reconstruct an output speech signal by concatenating the 
feature vectors corresponding to a sequence of the speech 
segments to form a series in a frequency domain, computing a 
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series of complex line spectra of the output signal from the 
series of feature vectors, and transforming the complex line 
spectra to a time domain signal . 

67. (Original) A product according to claim 66, wherein the 
input speech signal is processed by dividing the input speech 
signal into the segments and determining segment information 
comprising respective phonetic identifiers of the segments, 
and wherein to reconstruct the output speech signal, the 
processor selects the segments whose feature vectors are to be 
concatenated responsive to the segment information determined 
with respect to the segments. 

68. (Original) A product according to claim 66, wherein the 
input speech signal is divided into lef ernes, and the phonetic 
identifiers comprise lefeme labels. 

69. (Original) A product according to claim 66, wherein the 
segment information further comprises respective segment 
parameters including one or more of a duration, an energy- 
level and a pitch of each of the segments, responsive to which 
parameters the segments are selected by the computer for use 
in reconstructing the output speech signal. 

70. (Original) A product according to claim 69, wherein to 
reconstruct the output speech signal, the instructions cause 
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the computer to modify the feature vectors of the selected 
segments so as to adjust the durations and energy levels of 
the segments in the output speech signal. 

71. (Original) A product according to claim 66, wherein the 
feature vectors comprise respective degrees of voicing of the 
speech segments, for use by the computer in reconstructing the 
output speech signal. 

72. (Canceled) 

73. (Original) A product according to claim 66, wherein the 
window functions are non-zero only within different, 
respective spectral windows and have variable values over 
their respective windows, and wherein the feature vector 
elements are determined by calculating products of the 
spectral envelopes with the window functions, and calculating 
integrals of the products over the respective windows of the 
window functions. 

74. (Original) A product according claim 73, wherein a 
mathematical transformation is applied to the integrals in 
order to determine the elements of the feature vectors. 

75. (Original) A product according to claim 74, wherein 
the frequency domain comprises a Mel frequency domain, and 
wherein the mathematical transformation comprises log and 
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discrete cosine transform operations, which are applied so as 
to determine Mel Frequency Cepstral Coefficients to be used as 
the elements of the feature vectors. 
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